Skip to content

Feat: sno leader election config#1242

Merged
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
eggfoobar:feat_sno_leaderelection_config
Dec 8, 2021
Merged

Feat: sno leader election config#1242
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
eggfoobar:feat_sno_leaderelection_config

Conversation

@eggfoobar
Copy link
Copy Markdown
Contributor

Leader election in SNO clusters have limited resources where requests can put un-needed pressure on the kube-apiserver. This PR increases the retry period and lease duration on leader election and exposes some helper functions to determine topology and SNO leader election defaults, on operator start up.

Signed-off-by: ehila <ehila@redhat.com>
Comment thread pkg/config/leaderelection/leaderelection.go
@eggfoobar
Copy link
Copy Markdown
Contributor Author

To give a bit more context. Some of the numbers I was seeing are pasted below along with the queries ran against prometheus. The methodology was to create two 4.9 SNO-DU clusters (leader-election-new and leader-election-oc), both had CVO scaled down, and the only modification was that leader-election-new was running 11 operators that were rebuilt with the modified library-go running with the new leader-election numbers. I was seeing some small reductions, so I figured if we extrapolate that to the rest of the core operators we would see better gains in SNO-DU clusters, where we have very limited resources (currently management workloads are pinned to 2 cores).

Things I tried to verify was the downtime support for APIServer in SNO clusters described in this bz, rolling out APIServer didn't cause any restarts on the operators. Similarly rolling out the operator didn't cause any restarts and the operands behaved correctly with out any restarts as well.

Queries Ran

95th Quantile Over Time (CPU Usage) - API Server

Unchanged Changed Delta
478.90296182259607 462.6560489574072 -3.39
quantile_over_time(
  .95,
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      pod=~"kube-apiserver-leader-election-oc-49|kube-apiserver-leader-election-new-49",
      namespace="openshift-kube-apiserver"}[5m]
    )
  )[6h:]
) * 1000

Average Over Time (CPU Usage) - API Server

Unchanged Changed Delta
429.08792083196755 395.3039245479713 -7.87
avg_over_time(
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      pod=~"kube-apiserver-leader-election-oc-49|kube-apiserver-leader-election-new-49",
      namespace="openshift-kube-apiserver"}[5m]
    )
  )[6h:]
) * 1000

95th Quantile Over Time (CPU Usage) - Operators

Unchanged Changed Delta
192.51093951592614 174.9938822722223 -9.10
quantile_over_time(
  .95,
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      namespace=~"openshift-apiserver-operator|openshift-authentication-operator|openshift-cluster-storage-operator|openshift-console-operator|openshift-controller-manager-operator|openshift-etcd-operator|openshift-kube-apiserver-operator|openshift-kube-controller-manager-operator|openshift-kube-scheduler-operator|openshift-kube-storage-version-migrator-operator|openshift-service-ca-operator"
      }[5m]
    )
  )[6h:]
) * 1000

Average Over Time (CPU Usage) - Operators

Unchanged Changed Delta
173.11949276437636 159.50446507412303 -7.86
avg_over_time(
  sum(
    rate(container_cpu_usage_seconds_total{
      container!="POD",
      namespace=~"openshift-apiserver-operator|openshift-authentication-operator|openshift-cluster-storage-operator|openshift-console-operator|openshift-controller-manager-operator|openshift-etcd-operator|openshift-kube-apiserver-operator|openshift-kube-controller-manager-operator|openshift-kube-scheduler-operator|openshift-kube-storage-version-migrator-operator|openshift-service-ca-operator"
      }[5m]
    )
  )[6h:]
) * 1000

@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Nov 8, 2021

Summarizing the overall benefit to an SNO cluster over X timeframe will be helpful for weighing the tradeoff of slightly lower CPU cost at the expense of latency of lease reacquisition.

@eggfoobar
Copy link
Copy Markdown
Contributor Author

Sure thing, a quick summary with some new numbers. I ran the clusters under load. I used kube-burner to spin up some traffic against the API Server and just have some workloads in the cluster. The latest numbers show a pretty consistent picture of a small reduction in CPU consumption by the Changed Operators. On average by increasing leader election to 60s I'm seeing a reduction of 1-2 millicores per operator, extrapolating that to the 30 operators we run we could see 30millicore reduction on on average on the low end and 60millicore on the high end. That is a very small reduction but the gain is meaningful in environments where there is significant reduction in resources. Some DU configurations for example are pinning management workloads to 2 cores. This gives them very little breathing room and any measurable reduction is useful to provide more resources for Crio and Kubelet to operate. This does mean we reduce the responsiveness of our operators during rollouts, but the gain would mean that if need be we can support another operator in those limited environments.

In short, with the current numbers, we would have a worst non-graceful case of 330s before the new operator takes leader and a best graceful case of 60s which would mean a latency in lease reacquisition. The upside is that this would afford resource starved deployments some more breathing room in SNO environments (all be it small), which would offer Kubelet and Crio more room or potentially allow running another operator.

As always with things of this nature, another pair of eyes would be fantastic on the numbers, queries and the methodology. Any input is greatly appreciated.

// This method should only be called when running in an SNO Cluster.
func LeaderElectionSNOConfig(config configv1.LeaderElection) configv1.LeaderElection {

// We want to make sure we respect a 30s clock skew as well as a 4 retry attempt with out making
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide color into the 30s skew and why this duration was chosen? In the case of SNO where we are essentially talking about a single clock on the control plane, it's not directly apparent where the skew would come from in regards to a lease.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great call out, the idea there was to keep that mostly intact as it's not clear to me if there will be SNO clusters that might have a worker machine and what that arrangement might be in terms of where operators might fall. Figured to err on the side of caution and keep with the set convention.

Copy link
Copy Markdown

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, otherwise this lgtm

} else if infraStatus.ControlPlaneTopology == configv1.SingleReplicaTopologyMode {
snoLeaderElection := leaderelectionconverter.LeaderElectionSNOConfig(*b.leaderElection)
b.leaderElection = &snoLeaderElection
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that block move to WithleaderElection since it's possible that LE is disabled and this code can accidentally turn it on if we're not careful enough.

Copy link
Copy Markdown
Contributor Author

@eggfoobar eggfoobar Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, the reason why it's there is because we won't know the topology we're in until runtime, the function currently won't alter the disable flag in LE, but to make it more robust I changed the if statement to be if topology = SingleReplica AND LE is not disabled as well as some tests to ensure behavior, do you think that's a good enough safeguard against accidentally turning it on?

Copy link
Copy Markdown
Contributor

@deads2k deads2k Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is still in a spot that does not have access to the user's intent. @soltysh's comment looks unaddressed to me.

EDIT: I see the value being saved in WithleaderElection

@eggfoobar
Copy link
Copy Markdown
Contributor Author

eggfoobar commented Nov 23, 2021

/test unit

1 similar comment
@eggfoobar
Copy link
Copy Markdown
Contributor Author

/test unit

Copy link
Copy Markdown

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
deferring to @deads2k for approval

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Nov 26, 2021
Comment thread pkg/controller/controllercmd/builder.go Outdated
Comment thread pkg/controller/controllercmd/builder.go Outdated
klog.Warningf("unable to get cluster infrastructure status for leader election (falling back to default): %v", err)
} else if infraStatus.ControlPlaneTopology == configv1.SingleReplicaTopologyMode && !b.leaderElection.Disable {
snoLeaderElection := leaderelectionconverter.LeaderElectionSNOConfig(*b.leaderElection)
b.leaderElection = &snoLeaderElection
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is stomping the code configured leader election if someone specified their actual values. Only in WithleaderElection do you have sufficient information to know what the user provided so you don't overwrite it.

How you resolve this is up to you/@soltysh, but you cannot unconditionally stomp the user's intent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I altered this a bit, I added a flag approach to keep it straight forward and introduce less things. The idea being if the user provides no timing configs in the WithleaderElection then it's safe to assume we can change those configs, so we fetch the infraStatus to see if we should alter the timings for SNO. @soltysh What do you think about this approach?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How you resolve this is up to you/@soltysh, but you cannot unconditionally stomp the user's intent.

@deads2k that was also my suggestion here: https://github.com/openshift/library-go/pull/1242/files#r754222946

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent change @soltysh , I'm using the 0 values in the timings to ensure that if the user has decided to have some customizations in the timings then we won't alter the timings during runtime. That should make sure we don't stomp on the users intent, and if no values were provided, then we can query the cluster and change the values if SNO is detected. That's going to be triggered here

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 2, 2021
Comment thread pkg/config/clusterstatus/clusterstatus.go Outdated
Comment thread pkg/controller/controllercmd/builder.go Outdated
Comment thread pkg/controller/controllercmd/builder.go Outdated
Comment thread pkg/controller/controllercmd/builder.go Outdated
@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Dec 8, 2021

/approve

please squash before merge.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 8, 2021
added functions to help default leader election config for sno clusters
updated controller builder to use sno leader election when user provdes no options and we are currently running in sno control plane topology

Signed-off-by: ehila <ehila@redhat.com>

feat: added cluster infra status check for leader election

added check for sno topology for leader election config

Signed-off-by: ehila <ehila@redhat.com>

upkeep: updated warning message for clarity

Signed-off-by: ehila <ehila@redhat.com>

docs: added calculations and context

added calculations for the sno leader election numbers
added more comments around method to provide context and reasoning

Signed-off-by: ehila <ehila@redhat.com>

feat: added LE disable check and tests

added check for LE disable flag and SingleReplica before calling SNO leader election method
added tests for LeaderElectionSNOConfig and LeaderElectionDefaulting for expected behavior

Signed-off-by: ehila <ehila@redhat.com>

test: fixed test case for defaults

Signed-off-by: ehila <ehila@redhat.com>

test: changed unit test to use reflect package

changed unit test for equaility check to use reflect.DeepEqual instead of assert.Equal to follow convention of other unit tests

Signed-off-by: ehila <ehila@redhat.com>

test: fix unit test when running in cluster

fixed unit test that reads in namespace from cluster as fallback

Signed-off-by: ehila <ehila@redhat.com>

fix: added nil check for infraStatus and error out if not populated

Signed-off-by: ehila <ehila@redhat.com>

fix: added flag to derive user intent on leader election

added a flag to signal if the user supplied any leader election config
added logic to allow us to override the leader election if the user has provided no configs
added unit tests for sno config logic method

Signed-off-by: ehila <ehila@redhat.com>

feat: removed un-needed method

removed GetClusterInfraStatusOrDie method, not needed

Signed-off-by: ehila <ehila@redhat.com>

refactor: changed variable name and logging message

updated flag variable name to be more explicit
inverted logic to match variable name
updated messaging to match for event recorder and logging statement

Signed-off-by: ehila <ehila@redhat.com>

feat: changed flag logic to match recommendation

Signed-off-by: ehila <ehila@redhat.com>
@eggfoobar eggfoobar force-pushed the feat_sno_leaderelection_config branch from aa75fd2 to 2612981 Compare December 8, 2021 21:25
@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Dec 8, 2021

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Dec 8, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, eggfoobar, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 9b73bdc into openshift:master Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants